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Abstract 

Differential optical absorption spectroscopy (DOAS) is a powerful tool for 
detecting and quantifying trace gases in atmospheric chemistry [18]. DOAS 
spectra consist of a linear combination of complex multi-peak multi-scale struc- 
tures. Most DOAS analysis routines in use today are based on least squares 
techniques, for example, the approach developed in the 1970s [HI [151 HI EZ] 
uses polynomial fits to remove a slowly varying background (broad spectral 
structures in the data), and known reference spectra to retrieve the identity 
and concentrations of reference gases [19]. An open problem [18] is to identify 
unknown gases in the fitting residuals for complex atmospheric mixtures. 

In this work, we develop a novel three step semi-blind source separation 
method. The first step uses a multi-resolution analysis to remove the slow- 
varying and fast-varying components in the DOAS spectral data matrix X. 
The second step decomposes the preprocessed data X in the first step into a 
linear combination of the reference spectra plus a remainder, ox X = AS + R, 
where columns of matrix A are known reference spectra, and the matrix S 
contains the unknown non-negative coefficients that are proportional to con- 
centration. The second step is realized by a convex minimization problem 
S = arg min norm {X — AS), where the norm is a hybrid li II2 norm (Hu- 
ber estimator) that helps to maintain the non-negativity of S. The third step 
performs a blind independent component analysis of the remainder matrix R 
to extract remnant gas components. We first illustrate the proposed method in 
processing a set of DOAS experimental data by a satisfactory blind extraction 
of an a-priori unknown trace gas (ozone) from the remainder matrix. Numerical 
results also show that the method can identify multiple trace gases from the 
residuals. 
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1 Introduction 



Trace gases play an important role in climate change and air quality of the Earth's 
atmosphere. Spectroscopic techniques are widely used today for measurements of 
many trace species, and have evolved over the past century from the first use of the 
sun as a light source to identify atmospheric trace gases. Many different light sources 
(e.g., infrared and UV- visible lamps, lasers, and natural sources such as the sun) are 
now conventionally used to identify light-absorbing species as well as determine their 
concentrations using Lambert- Beer's law, 

/(A) = /o(A)-exp(-a(A)-p-L), (1.1) 

where /o(-^) is the initial intensity of light, J(A) is its intensity after traveling through 
a sample of path length, L, with concentration, p. Each species has its characteristic 
absorption cross section, o"(A), a measure of its ability to absorb light that varies with 
wavelength. The use of (11. ip is convenient for multi-component samples in laboratory 
spectrometers, but it is more difficult to determine the value of /o(A) in the atmosphere 
over a large wavelength range. 

A new method, differential optical absorption spectroscopy (DOAS), was intro- 
duced in the 1970s [HI [151 [HI IH] to analyze atmospheric trace gas concentrations. 
DOAS analysis separates the trace gas absorptions, which typically vary quickly with 
wavelength, from features that vary slowly with wavelength, e.g., light scattering pro- 
cesses by molecules and aerosols. Differential cross sections are then defined relative 
to this new broad background in place of the true /o(A). Several important trace gases 
were measured for the first time with DOAS, e.g., HONO, NO3, BrO, CIO in the tro- 
posphere, and OCIO and BrO in the stratosphere. A large number of other molecules 
absorb in the UV and the visible wavelength region and most aromatic hydrocarbons 
can also be detected. An advantage of DOAS is the ability to measure absolute trace 
gas concentrations in situ. DOAS is therefore especially useful for measuring highly 
reactive species such as the free radicals OH, NO3, or BrO, and it provides a powerful 
tool for studying emissions, transformation and transport of chemicals throughout 
the troposphere and stratosphere. It can also help to understand the infiuence of 
atmospheric chemistry on climate and air quality. A detailed description of DOAS 
can be found elsewhere [T5] . 

In general, DOAS spectra contain overlapping absorption structures which consist 
of complex multiple scales and peaks. They must be separated by the analysis routine 
to retrieve the concentrations of the trace gases. Least squares techniques are most 
often used for analysis of DOAS spectra, with the use of high pass filters to fit or 
separate out the slowly varying components. For example, the approach described in 
[T5] applies a polynomial fit to remove the broad (slow-varying) spectral features, and 
known reference spectra to retrieve the concentrations of reference gases. However, 
the existing DOAS approaches have two limitations: 1) the condition of least squares 
{that errors are normally distributed ) is often violated. This suggests that a different 
norm other than £2 (least squares) norm should be used; 2) the fitting residuals for 
atmospheric samples are in most cases not pure noise due to imperfect references, 
atmospheric turbulence, instrument effects, and unknown trace gases. Among other 
interesting problems, the identification of gas structures in the fitting residuals is of 
great importance. The method in this paper has been developed to address these is- 



sues, in the hope of providing a tool for atmospheric chemists to analyze the residuals 
for possible hidden trace gases. The method is designed to deal with the follow- 
ing three major challenges. First, DOAS spectra are complex multi-scale multi-peak 
structural data containing slow-varying features, structured signals due to the trace 
gases, and instrumental noise. Hence a multi-resolution analysis tool is needed for 
scale decomposition. Second, the identification of gases from the residuals is actually 
a problem of blind source separation (BSS) as both the trace gases (including their 
numbers) and mixing process are not known. A major problem is to find a working 
assumption on the source (hidden trace gas) signals and effective BSS algorithms. 
Third, the new objective function for data fitting should not only overcome the limi- 
tations of least squares fitting, but also help to maintain the non-negativity. To tackle 
these problems, we have made an initial attempt of developing a semi-blind approach 
which contains three steps. The first step uses multi-resolution analysis to remove the 
very slow (e.g. scattering) and very fast components (noise) in the DOAS spectral 
data matrix X. The second step decomposes the preprocessed data X in the first step 
into a linear combination of the reference spectra plus a remainder, oi X = A S + 
where columns of matrix A are known reference spectra, the matrix S contains the 
unknown non-negative coefficients. The second step is carried out by solving a con- 
vex minimization problem S = arg min norm {X — AS), where the norm is a hybrid 
£1/^2 norm that helps to maintain the non-negativity of S. The third step performs a 
blind independent component analysis of the remainder matrix R to extract remnant 
gas components. Our method can be useful for separating unknown sources from the 
residuals after any known reference spectra have been first deployed to fit the data. 

The paper is organized as follows. In section 2, we review the essentials of DOAS 
and the existing approach, then introduce our method. In section 3, we illustrate the 
proposed method in processing a set of DOAS experimental data, and show satisfac- 
tory numerical results. Concluding remarks are in section 4. 



2 DOAS and Signal Processing Methods 



2.1 DOAS and Fitting Methods 

A typical experimental setup for a DOAS instrument consists of a continuous light 
source, e.g., a Xe-arc lamp, a light-absorbing sample (the atmosphere or gases in 
a chamber), a grating spectrometer, and an optical detector. It is also possible to 
use the light from the sun or moon, or scattered sun light as light sources [TU [T5j . 
The typical length of the light path in the atmosphere ranges from several hundred 
meters to many kilometers and < 100 m in laboratory DOAS experiments. The 
light of intensity /o(A) passes through the sample, is typically dispersed by a grating 
spectrometer and is measured by a detector. During its way through the sample the 
light undergoes extinction due to absorption processes by trace gases and scattering 
by air molecules and aerosol particles. In the atmosphere, the intensity /(A) at the 
end of the light path is given by Lambert-Beer's law. 



/(A) = /o(A)exp 



y2 ^r'(^) ^ P^(^) + + ^m(A, /) dl 

Jo 



+ iV(A), (2.1) 



where the absorption cross section of a trace gas j, pj is its number density. L 

is the length of the hght path. The Rayleigh extinction by gases and Mie extinction 
by aerosols are described here by and Em- is the measurement noise. The 

basic idea of DOAS is the separation of the cross section crf^^ = af + a'j in which 
represents broad spectral features and the differential cross section a'j represents 
narrow spectral structures that are of interest for identification and quantification 
of the trace gases. If one considers only a'j, interferences with Rayleigh and Mie 
extinction are avoided. The mathematical description of this process is a convolution 
of /(A) with the instrument function H of the spectrometer, 

r(A) = /(A) *H= 1 A') H{\')d\' = 
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/o(A-A')exp -/ Vaj^^'(A-A')p,(0 + (^R + ^M)(A-A',/)d/ 
-AA [ Jo j 

/AA r pL 

/^(A-A')exp - / J]a;.(A-A')xp,(Od/ 
-AA Jo 



H{X')dX' 
H{X')dX' , 



where 

/^(A-A') = /o(A-A')exp 



Cy^^l^^- (0 + (^R + ^m) ( A - A', /) d/ 
Jo 



describes the broad spectral structures due to the characteristics of the light source 
Jo, the Rayleigh and Mie's extinction, and the broad absorption by trace gases. Jq is 
a slow- varying function of wavelength, so /*(A) can be approximated by 



r(A) = /^(A) 

= ^o(A)exp 
= ^o(A)exp 
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where Sj = / p,(/) dl, and Aj{X) = — cr'(A) * H{X) denote the narrow absorption 
Jo 

structures of the trace gases measured with the same instrument. Suppose that there 
are m known trace gases in the data. The logarithm of the above equation becomes 



X 



iX) = J2s,xA,iX) + B\X) + N\X) 



(2.2) 



where a;(A) = ln/*(A) and B'{X) = ln/Q(A) represents the broad spectral features. 
The noise A^'(A) = IniV(A). In the experiment, the wavelength range is mapped 



onto p discrete pixels of the detector. The sampled data points form p-dimensional 
column vectors of a data matrix X. Suppose there are n measurements, then X = 
[xi,X2, ■ ■ ■ ,Xn] G M^^'" whose column vectors are recorded DOAS data points. Equa- 
tion (12. 2p in matrix form is: 

X = AS + B + N, (2.3) 

where the columns of matrix A correspond to the reference spectra of the known trace 
gases; the matrix 5* contains non-negative coefficients; the matrix B includes the slow- 
varying components, and the matrix contains the noise components. Most DOAS 
approaches use the least squares methods to calculate S due to its computational 
simplicity. To deal with the problem that the data contain both broad and narrow 
spectral features, a high pass filter is needed to remove the broad spectral features. It 
is common to use polynomials to model and filter out the slowly varying parts from 
the narrow trace gas absorption. Equation (12. 3p is written as follows in [18]. 

X = AS + P + N, (2.4) 

where the polynomial P models the broad spectral structures. Given the order of 
polynomial and the known reference spectra, (12.41) can be solved with a least squares 
method. The polynomial fitting however has the following drawbacks: (1) the order 
of the polynomial is determined empirically and different orders might be used for 
different data; 2) the non-negativity of the concentration is not guaranteed during the 
fitting. An open problem after the fitting is how to identify and extract trace gases 
from the fitting residuals besides the noise. To address these issues, we propose a 
three step method in the next section. 

2.2 Proposed Semi-Blind Source Separation Method 

DOAS data can cover a range of scales and contain high frequency (<1 nm) artifact 
structures, for example due to pixel-to-pixel variability in the detector, while the 
reference spectra of the trace gases contain fewer peaks and peak widths on the order 
of several nm. Hence it is helpful to remove the fast varying artifacts from the spectra 
data by multi-resolution analysis. In addition, the broad features (slow-varying parts) 
in the data need to be eliminated in order to fit the reference spectra of the known 
trace gases. We propose to use the empirical mode decomposition (EMD) to extract 
these components. The detailed description of EMD can be found in [10]. The EMD 
method does not assume anything about the data, contrary to Fourier methods where 
data is assumed linear and stationary. EMD handles also non-stationary and nonlinear 
data. 

2.2.1 Multi-Resolution Analysis 

The concept of EMD has been developed rapidly in many areas of science and en- 
gineering since Huang et al. [10] invented EMD. Its key feature is to decompose a 
signal into so-called intrinsic mode function (IMF). The essential step extracting an 
IMF is to identify an oscillation embedded in a signal from local time scale. Consid- 
ering a signal s{t) between two consecutive local extrema (say, two minima at times 
ti and ^2), we can heuristically define a (local) high frequency part {d{t),ti < ^2}, 



where d{t) corresponds to the oscillation terminating at the two minima and passing 
through the maximum in between. For the picture to be complete, we also identify 
the corresponding (local) low-frequency part, or local trend, m{t) so that we have 
s{t) = m{t) + d{t) for ti <t < t2. Assuming that this is done in some proper way for 
all the oscillations in the entire signal, we get an intrinsic mode function as well as 
a residual consisting of all local trends. The procedure can then be repeated on the 
residual, and constitutive components of a signal can be iteratively extracted. 

The EMD method decomposes the DOAS data into a finite number of components 
of different frequencies. The advantage of EMD is that it is completely data-driven 
(no need to specify a parameter such as the order of a fitting polynomial), fast and 
automatic. Fig. [T] shows a typical DOAS spectrum of a trace gas mixture containing 
HONO, whose reference spectrum is also shown in the bottom panel. Fig. [2] shows 
the fast and slow components extracted from the DOAS data in Fig. [H The EMD 
preprocessed (high-passed) data X satisfies the following model 

X = AS + R, (2.5) 

where the columns of matrix A are the reference spectra of the known trace gases, and 
those of S matrix contains their concentrations, and R is the fitting residual which 
might contain the instrument noise, hidden trace gas structures, etc. For the estima- 
tion of the concentration matrix S, we minimize the following constrained objective 
function: 

minnorm(X - A^), s.t. ^ > 0, (2.6) 

s 

for a proper choice of the norm. 

2.2.2 Huber Estimator and Robust Data Fitting 

There are many kinds of norms available, e.g., £2 (least squares), ii (least absolute 
deviations). The regular least squares method (ignoring the non- negative constraint 
on S) is the conventional choice, if the unknown noise N is assumed to be Gaussian. 
However, it is rather sensitive to the outliers in the data, even one outlier can drasti- 
cally change the estimation. The least absolute deviations [ii norm) is more robust 
to outliers in the data, however it is less effective if the peaks in N are not isolated 
(or sparse). We find that a hybrid £2 and ii norm {£2 on small peaks and ii on large 
peaks), or a Huber estimator [11], is able to resist the influence of outliers, and main- 
tain non-negativity of S for our data fitting task. The Huber norm is both regular 
and convex. The corresponding nonlinear function H = H{x) is a parabola (£2) in 
the vicinity of zero, and increases linearly [ii) at |x| > k for any positive constant k. 
More precisely, 

= { k\x\-lk{~''\x\>k. ^'^■^^ 

The non-negativity of S under the Huber norm indicates that the choice fits the 
empirical distribution of the noise N arising from detection and photon statistics [T8] . 
Fig. [3] is an example showing the superiority of Huber 's estimator over least squares: it 
is resistant to outliers in the data, while the least squares result deviated significantly 
from the exact line due to the two outliers. Least squares assigns equal weighting 
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Figure 1: Top panel is a mixed DOAS spectrum (black) from the experiment and its 
smoothed counterpart (blue). Bottom panel is the spectral absorption reference of 
trace gas HONO. 




Figure 2: The preprocessed DOAS data after removing the slowest and fastest varying 
components. The x-axis of the plots is wavelength (nm). 

to each observation; the weights for the Huber estimator decline when |x| > k (see 
the weight functions in Fig. [3]). The value k for the Huber estimator is called a 
tuning constant; smaller values of k produce more resistance to outliers, but at the 
expense of lower efficiency when the errors are normally distributed. The tuning 
constant is generally selected to yield high efficiency in the normal case; in particular, 
k = 1.345(7 (where a is the standard deviation of the errors). The minimization of 
Ruber's objective can be achieved by the method of iterative re- weighted least squares. 
For the DOAS data in our numerical experiments, we found that the Ruber's estimator 
produced non-negative solutions S without explicit enforcement of non-negativity 
constraint. The theoretical underpinning is under further study. In the residual of 
Ruber estimation, there might be spectral structures (one or many) of trace gases 
buried in noise, or just random noise. In either case, we decompose the residuals in a 
blind fashion due to the lack of the knowledge of the hidden trace gases. The source 
signal assumption required for the decomposition is that the spectra of different trace 
gases are statistically independent (orthogonal). This appears to be a reasonable 
working assumption for many trace gases. Independent component analysis (ICA) 
can now be readily apphed. 

2.2.3 Independent Component Analysis 

ICA is a useful and generic tool for solving blind source separation problems (BSS), 
which arise when one attempts to recover source signals from their mixtures without 
knowing the mixing process [5], E]. ICA finds the independent components in the 
mixtures by maximizing the statistical independence (minimizing mutual information) 
of the estimated components. Mathematically, given the mixture matrix R G MP^^ 
and the number of independent source components d, ICA finds a full rank matrix 





Figure 3: Comparison of Ruber's estimator and least squares (their weight functions 
are shown in the bottom plots). The data points are generated hj y = —2x + 10 plus 
noise. Ruber: y = — 1.9794x + 9.9318; Least squares: y = — 1.0504x + 3.5819. 



W G M'^^" such that the output matrix U G W^^p given by 

U = W R' (2.8) 

contains columns (recovered source signals) as independent from each other as pos- 
sible. Here n is the number of residuals from the data fitting, p is the number of 
wavelength pixels. The columns of U correspond to the decomposed independent 
source signals. We may choose one of many ways to approximate independence, and 
this choice governs the form of the ICA algorithm. The two broadest definitions of 
independence for ICA are: (1) minimization of mutual information; (2) maximization 
of non-Gaussianity. The non-Gaussianity family of ICA algorithms use kurtosis and 
negentropy. The minimization of mutual information family of ICA algorithms use the 
Kullback-Leibler divergence and maximum-entropy, however, the knowledge of source 
signal probability distribution function (PDF) is needed. Algorithms for ICA include 
infomax P, FastICA p], and JADE p]. We opt for JADE because JADE is based 
on cumulants (2nd and 4th order statistics) and the approximate joint diagonalization 
of cumulant matrices (hence does not rely on PDF information of source signals). For 
moderate number of sources, it is more direct and stable than iterative methods such 
as infomax |T] and FastICA [12]. It was recently found [13] that the infomax method 
[1] may even diverge and that it only converges in a weak sense under proper rescaling 
and soft dynamic control of the iterations. The most attractive aspect of JADE is 
that it does not require parameter tuning (e.g. choosing the learning parameter in 
the iterative methods). In general, ICA algorithms cannot identify the actual number 
of source signals, so this number needs to be found by other means, for example by 
human evaluation of the end results. In our decomposition of Huber residuals, we 
tested a range for this number, and pinpointed the one with the most reliable and 
meaningful outcomes when calibrated with the knowledge of the existing trace gas 
spectral properties. 



3 Experiments and Computational results 
3.1 Experimental Setup 

Spectra of chemical mixtures were collected using an environmental chamber [7] for 
which DOAS is one of the analytical techniques used to measure species during exper- 
iments. Fig. m shows a simplified schematic of the chamber and optical arrangement 
for DOAS. The chamber is 561 L in volume and can be evacuated to a pressure of 
~ 10^^ Torr for collection of true /o('^) spectra. Spectra can also be collected be- 
fore and after addition of ultrapure air and each gaseous analyte of interest through 
various ports. 

The DOAS instrumentation consists of a high pressure Xe arc lamp (Oriel, Model 
6263) as the UV-visible light source. The light beam enters the chamber through 
a quartz window and undergoes multiple reflections using White cell mirrors f2T\ 
through the gas mixture in the chamber. The multiple reflections increase the path 
length of the light beam through the sample to a total path length of L = 52 m. 
The light beam exits the chamber through the quartz window and is focused on the 
entrance slit of a monochromator (Jobin Yvon-Spex, Model HR460) with a diode 



array detector (Princeton Instruments, model PDA-1024 ST121). The grating (1200 
grooves mm~^ blazed at 330 nm) gives a dispersion of ~ 0.043 nm /pixel and the 
detector has 1024 channels giving each spectrum a total wavelength range of ~ 44 nm. 
Spectra can be collected in different wavelength ranges by moving the grating motor. 
Changes in grating position as well as temperature lead to changes in dispersion of the 
light beam on the detector. This is taken into account in the least squares analysis by 
allowing for shifting or linear compression/expansion in one or more reference spectra 
along the wavelength axis to obtain the best fit. The use of such techniques is standard 
and user controlled to correlate wavelengths with channels of the detector. Absolute 
dispersion and wavelength were calibrated using a mercury lamp spectrum that was 
recorded daily and at the beginning of each experiment. 

The analytes added to the chamber were NO2 and O3 at a total pressure of ~ 1 
atm at room temperature in dry ultrapure air (Scott-Marrin, Riverside, CA). The 
wavelength range typically used to measure NO2 by DOAS is 340 - 380 nm. Although 
the air was dry (relative humidity < 0.8%), even small amounts of water react with 
NO2 to form HONO [8]. As a result HONO is almost always present in detectable 
quantities with NO2. HONO is also typically measured using the 340 - 380 nm 
wavelength range, thus the mixture of HONO and NO2 was used as a convenient test 
case for the new DOAS analysis technique. The addition of O3 leads to formation of 
NO3 radicals (NO2 + 03—7' NO2 + O2). Analysis for O3 is typically carried out in a 
different wavelength range, 290 - 330 nm. It should be noted that a wavelength range 
for analysis is usually that in which the cross sections are highest for that analyte in 
order to optimize the detection limits. O3 continues to absorb at wavelengths > 330 
nm, albeit with absorption cross sections that are lower by a factor of 100 or more 
[20] compared to those at shorter wavelengths. Another test of the technique was to 
determine if it could identify the presence of this third component, O3, in the 340 - 
380 nm range where its detection is not optimal. NO3 analysis was carried out in a 
different range, 600 - 640 and 640 - 680 nm, and is not discussed here. 

In addition to the new DOAS analysis technique introduced in this work, the 
typical linear least squares analysis was carried out on HONO and NO2 using MFC [5] 
for which reference spectra are needed. A reference spectrum for NO2 was generated 
by adding a known quantity of NO2 to the chamber and collecting DOAS spectra with 
the instrumentation described above. Pure samples of HONO are difficult to generate 
without the presence of NO2, thus HONO reference spectra were generated from 
published cross sections [31 H] which were convoluted to the dispersion and resolution 
of our spectrometer. Reference spectra for O3 were generated from published cross 
sections [20] also converted to the dispersion and resolution of the spectrometer. 

Chemicals used in these experiments are as follows: Gaseous NO2 was synthesized 
by reaction of gaseous NO (Matheson, 99%) which was first passed through a cold trap 
at 195 K to remove impurities such as HNO3, with an excess of O2 (Oxygen Services 
Co., 99.993%). The mixture was allowed to react for 2 hrs. and then purified by 
condensing the NO2 at 195 K to pump away excess O2. Gaseous O3 was generated as 
a mixture in O2 using a commercial ozonizer (Polymetrics, Model T-816). 
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Figure 4: Schematic of chamber, instrumentation, and optical setup used to make 
DOAS measurements of gaseous mixtures. 



3.2 Computational Results 

We report here the computational results for the proposed method. In the first exam- 
ple, we fit the known reference spectra of NO2 and HONO to the DOAS data. The 
method identifies an a-priori unknown trace gas O3 (ozone) from the fitting residuals. 
The results are shown in a series of plots. Fig. |2]- Fig. [61 We use 11 sets of data corre- 
sponding to different reaction times and hence different gas concentrations {X has 11 
columns) from the experiment. Fig. [2] illustrates the data preprocessing (EMD) de- 
scribed earlier which removes the fastest and slowest varying components. The Huber 
fitting results are presented in Fig. E] which shows the coefficients of NO2 and HONO 
in the 11 mixtures in comparison with the concentrations determined using the least 
squares fitting technique. These coefficients determined using the hybrid £i/i2 fitting 
technique are all non-negative as well as in very good quantitative agreement with 
values from least squares fitting. The fitting residual is in the third plot of Fig. 
Though some structure can be seen in the residuals, it is not clear if there are other 
spectral structures embedded in the fitting residuals. Then further identification was 
done by JADE. For the 11 residuals, we vary the number of independent components 
in the JADE computation. We observed that the structure of the first plot in Fig. [6] 
remains approximately invariant as the number varies. This invariance implies that 
it should be a hidden trace gas in the fitting residual. It can be seen that the iden- 
tified structure resembles O3 in many peak locations, especially the region 340-350 
nm. It should be noted that the £1/^2 fitting technique currently does not incorporate 



1.6x10 ■ 
1.4- 



c 
'o 

!t= 

o 
O 



1.2- 
1.0- 
0.8- 
0.6- 




-0- Coefficients, (^/fj fitting 
# Coefficients, least squares f tting 



-8x10" 


I 




2.0x10 - 









1.5- 




NO] 


4-1 


1.0- 






c 


-6 


mo 


icie 


0.5- 




(B 






o 


dj 




-5 


c 

(D 


o 
O 


0.0- 




U) 








O 




-0.5- 


-4 


3 












-1.0- 


-3 










-0- Coefficients, fitting 
• Coefficients, least squares fitting 



6x10 
4 
2 

-2 
-4 



11 



1 23456789 10 11 
Mixture Number 



1 23456789 10 11 
Mixture Number 




wavelength (nm) 

Figure 5: (Top row) comparison of the hybrid £1/^2 fitting and least squares tech- 
niques; HONO coefficients with 2s errors (left), NO2 coefficients with 2s errors (right), 
showing good quantitative agreement for eleven mixture spectra collected sequentially 
over 11 minutes. Corresponding concentrations in molecules cm~^ are provided on the 
right axis for each plot. (Bottom row) one fitting residual from robust data fitting. 



shifting and squeezing of spectra to optimize fitting, but this can be implemented in 
the future. Spectral shifts and squeezes are often used in DOAS analysis routines to 
account for changes in grating dispersion due to temperature fiuctuations and grating 
positioning accuracy [T8l [T9] . 

The second example uses the same set of data (11 mixtures), however we only 
use the reference spectrum of NO2 to fit the data. Ideally, we should recover O3 and 
HONO from the residuals. The two identified hidden spectral signals are in Fig. [7] and 
Fig. [HI The recovered fits are recognizable as HONO and O3 upon comparison with 
reference spectra, demonstrating the ability of the technique to identify absorption 
features without the use of reference spectra during the fitting procedure. While the 
£1/^2 technique is demonstrated here for laboratory DOAS data with three compo- 
nents, its utility lies in the analysis of atmospheric DOAS spectra, which are more 
complex. The least squares method works best when reference spectra for all known 
absorbing species are used to carry out the fitting, i.e., when the fit residuals are un- 
structured and do not vary considerably with wavelength. Given that this condition is 
rarely satisfied for complex atmospheric measurements, the method described here is 




340 350 360 370 380 

Wavelength (nm) 

Figure 6: Recovered O3 and its absorption cross section [20] for comparison 

complementary in that it can identify species that are either not known to be present 
or do not yet have available published cross sections. In addition, the results in Fig. 
Oshow its value as an alternative stand-alone technique for analyzing DOAS spectra 
with the use of appropriate reference spectra. 

4 Concluding Remarks 

We developed a semi-blind source separation method for retrieving the concentrations 
and performing identifications of trace gases from DOAS spectra. The method is 
designed to identify potentially hidden trace gases after fitting the known trace gases 
to the data, which is a challenging problem. Our method can be useful for separating 
unknown source signals from the residuals after any known reference spectra have 
been first deployed to fit the data. The first novelty of the method is to employ 
the multi-resolution analysis (EMD) to remove the slowest varying component from 
the data. The removal of such components relies on a polynomial fit in the existing 
methods. Different polynomials may produce different results, and the degree of 
the fitting polynomial is often empirically defined. The multi-resolution approach 
avoids specifying the order of polynomial, and it extracts the slow component in an 
automatic fashion. The second novelty is to use a hybrid ^1/^2 interpolated norm 
(Huber function) to fit the data, which reduced the effects of outliers and kept the 
concentrations non-negative. Lastly, a multi-channel signal decomposition method 
(JADE) produced encouraging results on extracting hidden source signals from the 
fitting residuals. While use of the least squares fitting procedure for atmospheric data 
can quantify several trace species simultaneously, typical fit residuals often suggest 
there are remaining absorbers. In some cases, species can be inferred based on known 
atmospheric chemistry, e.g., HONO is often present in NO2 mixtures. The major 
strength of the technique described here is its ability to be used either with existing 
published reference spectra for quantification or without references for identification 
of new absorbers. Numerical results on DOAS data show the promising potential of 
our method on both trace gas recovery and quantification. 
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Figure 7: Top plot is the the identified spectral structure 1 compared to the spectral 
reference of HONO (bottom). 




Figure 8: Top plot is the the identified spectral structure 2 compared to reference 
spectrum for O3 (bottom). 
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